Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

RBONN: Recurrent Bilinear Optimization for a Binary Neural Network

FIGURE 3.25

Evolution of the binarized values, |x|s, during the XNOR and BONN training process. They

are both based on WRN-22 (2nd, 3rd, 8th, and 14th convolutional layers), and the curves

do not share the same y-axis. The binarized values of XNOR-Net tend to converge to small

and similar values, but these of BONN are learned diversely.

a learning rate schedule that decreases to 10% every 30 epochs. As shown in Table 3.6, our

Bayesian feature loss can further boost the performance of models with real values by a clear

margin. Speciﬁcally, our method promotes the performance of ResNet-18 and ResNet-50 by

0.6% and 0.4% Top-1 accuracies, respectively.

3.8

RBONN: Recurrent Bilinear Optimization for a Binary Neural

Network

We ﬁrst brieﬂy introduce the bilinear models in deep learning. Under certain circumstances,

bilinear models can be used in CNNs. An important application, network pruning, is among

the hottest topics in the deep learning community [142, 162]. Vital feature maps and related

channels are pruned using bilinear models [162]. Iterative methods, e.g., the Fast Iterative

Shrinkage-Thresholding Algorithm (FISTA) [141] and the Accelerated Proximal Gradient

(APG) [97] can be used to prune bilinear-based networks. Many deep learning applications,

such as ﬁne-grained categorization [146, 133], visual question answering (VQA) [278], and

person re-identiﬁcation [214], are promoted by embedding bilinear models into CNNs, which

model pairwise feature interactions and fuse multiple features with attention.

Previous methods [77, 148] compute scaling factors by approximating the weight ﬁlter

with real value w such that w ≈α◦b^w, where α ∈R+ is the scaling factor (vector) and b^w=

sign(w) to enhance the representation capability of BNNs. In essence, the approximation

TABLE 3.6

Eﬀect of Bayesian feature loss on the ImageNet

data set. The core is ResNet-18 and ResNet-50

with real value.

Model

ResNet-18

ResNet-50

Bayesian feature loss

Accuracy

Top-1

69.3

69.9

76.6

77.0

Top-5

89.2

89.8

92.4

92.7